/* 
=========================================
FACTORS INFLUENCING HOUSEHOLD DEBT LEVELS 
Program name: DTI panel regression.do
Created by: David Norman
Date last modified: March 2020
Purpose: identify determinants of household debt levels across countries
Published in: August 2020
=========================================
 */

* Memory settings
set more off
set matsize 5000
clear

* Import data
cd "{FILE STRUCTURE}"

log using "DTI determinants.smcl", replace

import excel using "stata_input.xlsx", firstrow

/*
**************************************
* VARIABLES USED IN THIS PROGRAM 
**************************************

DTI			Household debt-to-income ratio							%

y			Household income per capita							US$, nominal
y_r			Real household income per capita						US$, real
y_f			Consensus forecast of nom GDP growth (GDP*CPI), made in March of prior year	%
y_rf		Consensus forecast of real GDP growth, made in March of prior year		%
dPop		Popoulation growth over past 2 years						%pa, only for cities with at least 300k people
d2Pop		Acceleration in population growth, past 2y minus 5y ave				%pa, only for cities with at least 300k people
dHP			Growth in real house prices over past 3 years					%pa
i			Nominal 10y bond rate								%
r			Nominal 10y bond minus past year CPI inflation					%
pi			Inflation rate, measured by CPI							%pa
PB			Banks' price-book ratios							Ratio
UR			Unemployment rate								%
stock		Dwelling stock per head of population						Ratio
spread		Interest rate spread of Baa to Aaa bonds					%pts

urban		Share of population living in cities (that had atleast 1m people in 2018)	%
pFall		Largest peak-trough house price fall in prior history (decays over time)		%
liberalise	Index of financial liberalisation, from IMF, updated by authors			Index
age*		Share of adult population aged <20y, 20-24y, 25-34y, 35-44y, 45-54y or 55-64y	%
tenure		Share of households living in fully or partly owned house			%
own			Share of housing stock owned by households (either rented or not)		%
tax			Index of owner-occupier interest deductibility, 1 = yes, 0 = no			Dummy
CapG		Index of whether capital gains on housing taxed, 1=yes, 0.5=discount, 0=no	Categorical
density		Population-weighted density of major cities					person/ha
ineq		Inequality, measured by Gini coefficient					
legal		Strength of legal rights index from IMF, 0-12 with higher being stronger	Index
*/

**************************************
* Format/transform data
**************************************

sort country year
egen countryID = group(country), label lname(country)
xtset countryID year

gen age_depend = ageU20 + age65plus
gen age_young = ageU20 + age25
forval i = 25(10)65 {
	gen age`i'A = age`i'/(100-ageU20)*100
	gen age`i'B = age`i'/(100-age_depend)*100
}

*gen newvar = real(UR)
*drop UR
*rename newvar UR

* Lag dHP
gen tmp = L.dHP
replace dHP = tmp
drop tmp

**************************************
* Graph/analyse data
**************************************
/*
foreach var of varlist y y_r y_rf dPop d2Pop pi i r liberalise age* pFall UR stock {
	twoway scatter DTI `var'
	graph save g_`var', replace
}

foreach var of varlist CapG tax density ineq legal own tenure urban {
	twoway scatter DTI `var' if year==29
	graph save g_`var', replace
}

sort country year
by country: gen d3DTI = DTI - DTI[_n-3]

foreach var of varlist PB dHP {
	twoway scatter d3DTI `var'
	graph save g_`var', replace
}
*/

* Descriptive stats
tabstat DTI y_r y_rf r pi dHP pFall dPop urban stock liberalise own legal density age_young age65plus if year>1, stats(mean sd min max)

* Check instrument validity
correlate DTI y_r y_rf r density dHP pi liberalise pFall dPop urban stock own legal PB tenure tax CapG ineq spread y_rf age?? age_depend boomers 

* Create log variables
foreach var of varlist DTI y y_r y_rf liberalise pFall dPop density ineq legal own tenure urban PB stock {
	gen l_`var' = ln(`var')
}
replace l_pFall = ln(0.1) if pFall==0		// this is done to ensure that don't drop 0 value observations, which are few but meaningful. The impact of this is very small, as only influences 2 years for USA (1990 and 1991)

foreach var of varlist l_y_r UR pi i r spread l_liberalise dPop d2Pop l_pFall l_stock l_PB age* {
	regress l_DTI `var'
}

foreach var of varlist CapG tax l_density l_ineq l_legal l_own l_tenure l_urban {
	regress l_DTI `var' if year==29
}

gen l_pFallLag = L2.l_pFall

* Unit root/cointegration tests
* Note: (1) Fisher test chosen for unit root because allows unbalanced panel and assymptotics based on T -> infinity; Choi (2001) recommends that inverse normal statistic is best balance of size and power. (2) Pedroni test chosen for cointegration because assumes different AR coefficients in each panel. Pedroni (2004) reports that ADF statistic has best power when T < 100
xtunitroot fisher l_DTI, pp l(1)
xtcointtest pedroni l_DTI r pi l_y_r l_liberalise

* Include dummy to deal with France issue in early years
gen dummy = 0
replace dummy = 1 if country=="FRA" & year<5

**************************************
* Begin regressions
**************************************

* Check that time-varying endogenous variables have sufficient within variation to act as their own instruments:
xtsum y_r r pi dPop dHP PB

* Main ("narrow") regression (drops less significant TI variables to ensure rho doesn't fall too far, and PB to maintain length of sample):
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal, endog(l_y_r r l_liberalise l_stock l_density)
estimates store est_narrow		
predict narrow_ue, ue
predict narrow_u, u
predict narrow_e, e
predict narrow_pv, xbu
predict narrow_se, stdp

* Dropping own to expand sample - "wide" regression:
	xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal dummy, endog(l_y_r r l_liberalise l_stock l_density)
estimates store est_wide		
predict wide_ue, ue
predict wide_u, u
predict wide_e, e

* Dropping own, density and stock to get full sample (i.e. Chile, Korea and Japan)
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_liberalise age_young age65plus l_legal dummy, endog(l_y_r r l_liberalise)	
estimates store est_largeN	

 	
**************************************
* ROBUSTNESS
**************************************

* 1. TEST FOR STRICT EXOGENEITY
* (Test time varying variables - is coefficient on future level significant? If so, appears to be endogenous; see Wooldridge p325)
xtset countryID year
xtreg l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal f.l_y_r f.r f.pi f.l_liberalise f.dHP f.l_pFallLag, fe 
xtreg l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal f.l_y_r f.r f.pi f.l_liberalise f.dHP f.l_pFallLag, fe 

* Re-estimate model with instrument for real income, given appear to be correlated with e_it:
gen l_prody = ln(Prody)
* 2SLS
xtreg l_y_r l_prody r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal, fe
	predict IV_y, xbu
	xtreg r IV_y pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal, fe
	predict IV_r, xbu
	xthtaylor l_DTI IV_y IV_r pi dHP l_pFallLag l_liberalise age_young age65plus l_own l_legal l_density dPop l_urban l_stock, endog(IV_y IV_r l_liberalise l_density)
	estimates store est_IV_narrow
xtreg l_y_r l_prody r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal, fe
	predict IV_y2, xbu
	xtreg r IV_y2 pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal, fe
	predict IV_r2, xbu
	xthtaylor l_DTI IV_y2 IV_r2 pi dHP l_pFallLag l_liberalise age_young age65plus l_legal l_density dPop l_urban l_stock, endog(IV_y2 IV_r2 l_liberalise l_density)
	estimates store est_IV_wide

* 2. CHECK TIME TRENDS 

* Removing common time trend:
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal year, endog(l_y_r r l_liberalise l_stock l_density)		
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal year dummy, endog(l_y_r r l_liberalise l_stock l_density)		

* Remove country-specific time trend:
forval i = 1/22 {
	gen tt`i' = year if countryID==`i'
	replace tt`i' = 0 if countryID!=`i'
}
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal tt*, endog(l_y_r r l_liberalise l_stock l_density)	
			estimates store est_tti_narrow
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal dummy tt*, endog(l_y_r r l_liberalise l_stock l_density)
			estimates store est_tti_wide

* Do it using (common) time dummies instead:
forval i = 1/29 {
	gen dum`i' = (year==`i')
}
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal dum? dum??, endog(l_y_r r l_liberalise l_stock l_density)	
	estimates store est_timefe_narrow
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal dum*, endog(l_y_r r l_liberalise l_stock l_density)		
	estimates store est_timefe_wide
	
* Note2: could also just check fe regression for TV variables:
xtreg l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_liberalise age_young age65plus dummy year, fe vce(robust)			
xtreg l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_liberalise age_young age65plus dummy i.date, fe vce(robust)
* observe: 
*	FE estimator with time trend makes r and pi insignificant. HT estimator with time trend (above) does the same
*	FE estimator with time dummies does the same, and removes significance of y

* 3. CHECK INCLUSION OF DIFFERENT VARIABLES

* a) Including everything (small N due to own; issue is sigma mu=0; second adds PB and third drops own):
xthtaylor l_DTI l_y_r r spread pi UR dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal tax CapG l_ineq, endog(l_y_r r spread l_liberalise l_stock l_density)	
		estimates store est_all_narrow
xthtaylor l_DTI l_y_r r spread pi UR PB dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal tax CapG l_ineq, endog(l_y_r r spread l_liberalise l_stock l_density)		
xthtaylor l_DTI l_y_r r spread pi UR dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal tax CapG l_ineq dummy, endog(l_y_r r spread l_liberalise l_stock l_density)	
		estimates store est_all_wide

* b) Cycling though different TI variables, one at a time:
foreach var of varlist l_density l_tenure l_ineq {
	quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_legal dummy `var', endog(l_y_r r l_liberalise)		
	estimates store est_TIV`var'
	quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_legal l_own `var', endog(l_y_r r l_liberalise)		
	estimates store est_TIV`var'_narrow
}

estimates table est_TIV*, star(0.1 0.05 0.01) stats(F)

* c) Including additional time varying regressors:
quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_legal l_density UR spread, endog(l_y_r r l_liberalise l_density spread)		
	estimates store est_TV_narrow
quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_legal l_density dummy UR spread, endog(l_y_r r l_liberalise l_density spread)		
estimates store est_TV_wide
estimates table est_TV*, star(0.1 0.05 0.01) stats(F)

* d) Check combination of urbanisation, density, stock and population
foreach var of varlist l_stock l_density {
	quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag l_liberalise age_young age65plus l_own l_legal dummy `var', endog(l_y_r r l_liberalise `var')	
	estimates store est_`var'
}
foreach var of varlist l_urban dPop {
	quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag l_liberalise age_young age65plus l_own l_legal `var', endog(l_y_r r l_liberalise)
	estimates store est_`var'
}
estimates table est_l_urban est_l_stock est_l_density est_dPop, star(0.1 0.05 0.01)	stats(F)	

* e) Investigate tax variables:
* Remember: CapG = 1 means full taxation of capital gains; tax = 1 means full deductibility of interest; expect tax to have positive sign and CapG to have negative sign
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal tax CapG, endog(l_y_r r l_liberalise l_stock l_density)		
xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal dummy tax CapG, endog(l_y_r r l_liberalise l_stock l_density)		

* create interaction variables
foreach var of varlist pi r i year {
	gen CapG`var' = CapG * `var'
	gen tax`var' = tax * `var'
}

foreach var of varlist CapGpi CapGr CapGi CapGyear taxpi taxr taxi taxyear {
	quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal `var', endog(l_y_r r l_liberalise l_stock l_density `var')
	estimates store est_`var'_narrow
}
estimates table est_CapG*_narrow, star(0.1 0.05 0.01)	stats(F)	
estimates table est_tax*_narrow, star(0.1 0.05 0.01)	stats(F)

foreach var of varlist CapGpi CapGr CapGi CapGyear taxpi taxr taxi taxyear {
	quietly xthtaylor l_DTI l_y_r r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus  l_density l_legal dummy `var', endog(l_y_r r l_liberalise l_stock l_density `var')
	estimates store est_`var'_wide
}
estimates table est_CapG*_wide, star(0.1 0.05 0.01)	stats(F)	
estimates table est_tax*_wide, star(0.1 0.05 0.01)	stats(F)	

* f) Include income expectations
quietly xthtaylor l_DTI l_y_r l_y_rf r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_own l_density l_legal, endog(l_y_r l_y_rf r l_liberalise l_stock l_density) // narrow sample		
	estimates store est_expect_narrow
quietly xthtaylor l_DTI l_y_r l_y_rf r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal dummy, endog(l_y_r l_y_rf r l_liberalise l_stock l_density) // wide sample	
	estimates store est_expect_wide
quietly xthtaylor l_DTI l_y_r l_y_rf r pi dHP l_pFallLag dPop l_urban l_stock l_liberalise age_young age65plus l_density l_legal dummy year, endog(l_y_r l_y_rf r l_liberalise l_stock l_density) // wide sample with time trend	
	estimates store est_expect_widett
	
estimates table est_expect*, star(0.1 0.05 0.01) stats(F N)	
	
**************************************
* ADDITIONAL RESULTS TABLE
**************************************

estimates table est_wide est_IV_wide est_tti_wide est_timefe_wide est_expect_wide est_all_wide est_TV_wide , drop(tt* dum*) star(0.1 0.05 0.01) stats(N F) b(%9.3f)
estimates table est_narrow est_IV_narrow est_tti_narrow est_timefe_narrow est_expect_narrow est_all_narrow est_TV_narrow , drop(tt* dum*) star(0.1 0.05 0.01) stats(N F) b(%9.3f)
	
**************************************
* OUTPUT ADDITIONAL RESULTS
* To calculate contributions to differences across countries (using narrow sample)
**************************************

* For Australia:

tabstat l_DTI pi dHP l_pFallLag dPop l_urban l_liberalise l_y_r r l_stock l_own l_legal l_density age_young age65plus if country=="AUS" & year==39	// 2018 data
tabstat l_DTI pi dHP l_pFallLag dPop l_urban l_liberalise l_y_r r l_stock l_own l_legal l_density age_young age65plus if country=="AUS" & year==9	// 1988 data
tabstat l_DTI pi dHP l_pFallLag dPop l_urban l_liberalise l_y_r r l_stock l_own l_legal l_density age_young age65plus if country=="AUS" & year==36	// 2015 data - use for comparing across countries since have 16 observations this year

* For countries other than Australia:
* narrow sample

tabstat l_DTI pi dHP l_pFallLag dPop l_urban l_liberalise l_y_r r l_stock l_own l_legal l_density age_young age65plus if country!="AUS" & year==39 & l_own!=., stats(mean p50 n)	// 2018 data, narrow
tabstat l_DTI pi dHP l_pFallLag dPop l_urban l_liberalise l_y_r r l_stock l_own l_legal l_density age_young age65plus if country!="AUS" & year==9 & l_own!=., stats(mean p50 n) // 1988 data, narrow
table year if country!="AUS", c(mean narrow_ue n narrow_ue)

* wide sample
tabstat l_DTI pi dHP l_pFallLag dPop l_urban l_liberalise l_y_r r l_stock l_own l_legal l_density age_young age65plus if country!="AUS" & year==37, stats(mean p50 n)	// 2016 data - after that sample starts dropping off too fast
tabstat l_DTI pi dHP l_pFallLag dPop l_urban l_liberalise l_y_r r l_stock l_own l_legal l_density age_young age65plus if country!="AUS" & year==20, stats(mean p50 n)	// 1999 data - note short sample to capture more countries
table year if country!="AUS", c(mean l_stock p50 l_stock n l_stock) // annual data for stock, to see when capture enough countries for meaningful comparisons
table year if country!="AUS", c(mean l_y_r p50 l_y_r n l_y_r) // annual data for income, to see when capture enough countries for meaningful comparisons
table year if country!="AUS", c(mean wide_ue n wide_ue)



log close